We consider the problems of learning forward models that map state tohigh-dimensional images and inverse models that map high-dimensional images tostate in robotics. Specifically, we present a perceptual model for generatingvideo frames from state with deep networks, and provide a framework for its usein tracking and prediction tasks. We show that our proposed model greatlyoutperforms standard deconvolutional methods and GANs for image generation,producing clear, photo-realistic images. We also develop a convolutional neuralnetwork model for state estimation and compare the result to an Extended KalmanFilter to estimate robot trajectories. We validate all models on a real roboticsystem.
展开▼